Similarity search in high-dimentional spaces is a pivotal operation found avariety of database applications. Recently, there has been an increase interestin similarity search for online content-based multimedia services. Thoseservices, however, introduce new challenges with respect to the very largevolumes of data that have to be indexed/searched, and the need to minimizeresponse times observed by the end-users. Additionally, those users dynamicallyinteract with the systems creating fluctuating query request rates, requiringthe search algorithm to adapt in order to better utilize the underline hardwareto reduce response times. In order to address these challenges, we introducehypercurves, a flexible framework for answering approximate k-nearest neighbor(kNN) queries for very large multimedia databases, aiming at onlinecontent-based multimedia services. Hypercurves executes on hybrid CPU--GPUenvironments, and is able to employ those devices cooperatively to supportmassive query request rates. In order to keep the response times optimal as therequest rates vary, it employs a novel dynamic scheduler to partition the workbetween CPU and GPU. Hypercurves was throughly evaluated using a large databaseof multimedia descriptors. Its cooperative CPU--GPU execution achievedperformance improvements of up to 30x when compared to the single CPU-coreversion. The dynamic work partition mechanism reduces the observed queryresponse times in about 50% when compared to the best static CPU--GPU taskpartition configuration. In addition, Hypercurves achieves superlinearscalability in distributed (multi-node) executions, while keeping a highguarantee of equivalence with its sequential version --- thanks to the proof ofprobabilistic equivalence, which supported its aggressive parallelizationdesign.
展开▼